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Evolutionary dynamics in finite populations is known to fixate eventually in the absence of mu- 
tation. We here show that a similar phenomenon can be found in stochastic game dynamical batch 
learning, and investigate fixation in learning processes in a simple 2x2 game, for two-player games 
with cyclic interaction, and in the context of the best-shot network game. The analogues of finite 
populations in evolution are here finite batches of observations between strategy updates. We study 
when and how such fixation can occur, and present results on the average time-to-fixation from 
numerical simulations. Simple cases are also amenable to analytical approaches and we provide 
estimates of the behaviour of so-called escape times as a function of the batch size. The differences 
and similarities with escape and fixation in evolutionary dynamics are discussed. 
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I. INTRODUCTION 

Modern approaches to game theory have moved beyond the identification of equilibrium points of games 
' Nash 19501 iNash 195lllvon Neumann fe Morgenstern 1944|. and instea d consider dynamical processes in pop- 



ulations of agents [Mavnard Smith 1998 , Nowak 20061 ISigmund 20101 or the adaptat ion of a given set of 



agents to each other's actions Fudenberg fe Levine 1989L I Young 2004L ICamerer 20031 . The study of pop- 



ulations of players is the focus of what is now called 'evolutionary game theory' [Mavnard Smith 19981 
Vega-Redondo 20031 . iGintis 2000j . Within this field two approaches can broadly be distinguished. The more 
conventio nal one describes evolving populations by means of deterministic replicator equations, for textbooks 
see e.g. Hofbauer fe Sigmund 19 98]. The dynamical behaviour and attractors of these systems are studied 



with tools from the theory of nonlinear differential equations. Such formulations are valid formally only for 
infinite populations of agents, and systematically neglect stochastic effects in finite populations. The study 
of these random proces ses is at the centre of the second, more recent class of studies in evolutionary game 
theory, see for example Traulsen fe Hauert 20 09] for a review. Crucial differences between the behaviour of 



finite and of infinite populations have been identified, for example finite systems may fixate at pure-strategy 
absorbing states even when the corresponding deterministic replicator equations have their attractors at mixed 
equilibria. These stochastic processes are studied with a variety of different tools, including the master equa- 
tion formal ism, system-size expansions, backwards Fokker-P lanck methods, and other concepts from statistical 
mechanics Ivan Kamoen 19921 iRisken 199(1 iGardiner 2009j . 

The purpose of the present paper is to parallel existing research on stochastic effects in evolutionary systems 
with studies of corresponding effects in stochastic learning dynamics. Learning is here related to, but different 
from evolution. Learning, or adaptation, is concerned with a fixed set of players who interact repeatedly 
in a given game, and who react to the ir opponent's actions by modifying their own strategic propensities 
Fudenberg fc Levine 19891 lYoung 200l |. These processes occur on much shorter time scales than evolutionary 



dynamics. Adaptation dynamics of the type we study here are of interest in two main different contexts. First, 
learning models provide mathematical descriptions of human or animal decision makin g and can be u sed to 



model the outcome of experiments in behavioural game theory and cognitive science Camerer 20 03]. The 



second main area in which models of adaptation are relevant is in machine learning and algorithmic game 
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theory [Nisan et al 2007 1. Here the interest is not in the modelling of human behaviour, but instead in the 
properties and design of algorithms with which to identify equilibrium points or solutions of optimisation 
problems. Understanding the dynamics of learning is of key importance in both of these applications. 
In learning there are no birth-death processes as in evolution, but instead dynamical updates of the agents' 
strategy profiles in time. Very little wor k exists on t he systematic comparison of the effects of noise in evolution 
and in learning. Initial investigations jGalla 20091 ] have shown, that similar to what is seen in evolutionary 
processes, the dynamics and attractors of stochastic learning can be quite different from that of deterministic 
adaptation processes. Up to now the analyses of fluctuations in learning are howev er limited to the identification 
of so-called quasi-cycles, also seen in evolution [Bladon et al 2O10l iMobilia 201dl |. In the present paper we aim 
to establish further analogies between the two mode lling approaches, and focus in particular on fixation effects 



Antal fc Scheming 20061 lAltrock fc Traulsen 2009j | . Fixation here refers to processes by which dynamical 



systems reach absorbing states. In evolution these are typically points at the boundaries of strategy space, at 
which only one species (pure strategy) survives, and where all other strategies are extinct. In finite populations 
the elimination of species may happen by random drift, and in the absence of mutation a species is then never 
introduced again in the dynamics once all its representatives have been removed. The system thus fixates in 
an absorbing state. 

In this paper we investigate the extent to which a similar removal of strategies may occur in multi-player 
learning. The analogue of extinction is here the convergence of a player's strategic propensities to a pure 
strategy. The question we address is here when and how stochastic learning fixates. In particular we ask 
(i) under what circumstances convergence to pure, rather than mixed equilibra occurs in learning, (ii) if 
fixation occurs what are the corresponding extinction times, and (iii) given that extinction phenomena are 
well known in evolutionary systems, what are the differences and similarities with fixation in learning ? 
To answer these questions we consider several different types of games. After a general introduction to 
learning and the required definitions in Sec. |TT]we first study simple two person games in Sec. IIIII We then 
turn to games with cyclic interaction in Sec. | IV[ before we fi nally discuss a more intricate best-shot game 
' Galeotti et al 2010L iDall' Asta et al 20091 IDall' Asta et al 2010j | defined on regular random graphs (Sec. EJ). 



The final section summarises our results and discusses possible future work. 



II. DETERMINISTIC AND STOCHASTIC LEARNING 



A. General definitions 



In this paper we will consider both two-player and multi-player games. Interaction will occur in learning 
processes, in which each player interacts only with a small number p — 1 of other agents, in two-player games 
we will have p = 2, for multi-player games one has p > 2. Individual players will typically be labelled by 
indices i £ {1, . . . , M}, where M stands for the total number of players in the model at hand. We will restrict 
the discussion to symmetric non-cooperative games. The variable S will indicate the number of pure strategies 
available to each of the players. Following the standard game theoretic notation we will write w(s,s_i) for 
the payoff player i receives when playing pure strategy s e {I, . . . , S}, and when her opponents play actions 
s_i e {I, . . . , 5} p_1 . This paper focuses only on symmetric games so that u(-, •) is identical for all players, 
and carries no explicit dependence on i. We will use the notation for player i's mixed strategy, i.e. we have 
Xi = (aCt,ii • ■ • T x i,s) with ^2 S Xi )S = 1 for all i. The component Xi jS indicates the frequency with which player 
i £ {1, . . . , M} plays pure strategy s £ {1, . . . , S}. 



B. Learning 

We will here focus on a re-inforcement type learning model, and assume that each player keeps a score valuation 
of each of his/her pure strategies, these are a measure of the (perceived) relative performance of the pure actions 
in the past, and indicate the propensity of playing any particular pure action. Discarding memory-loss, the 
valuation qi, s (t) player i has for pure strategy s is the cumulative payoff i would have received in all past rounds 
up to time t, given his opponent's actions, and had i always played pure strategy s up to time t. This will be 
detailed further below. Following [Camerer 20031. IChong et al 20071 ISato fc Crutchfield 20031 ISato et al 2002] 
we will assume that given the score valuation vector qj(£) = (qi,i(i), . . . , qi,s(t)) player i chooses each of the 
pure strategies according to a logit rule, i.e. that the probabilities of playing the different pure strategies 
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depend on the score valuations via the following relations: 

e /?9i,.(i) 

The variable (5 is here a model parameter, and describes a learning rate or intensity of selection. For f3 — > oo 
the players strictly choose the pure action with highest propensity. For (3 = they play at random. 
A learning dynamics is then a description governing the evolution of the in time. We will here mostly 

focus on a re-inforcement learning rule of the form 

g<,.(i + l) = (l-a)« il ,(*)+tt(s > s_<(t)). (2) 

The interpretation of these update rules is understood best by first considering the case a — 0: in this case 
the increment of qi jS between time-steps t and t + 1 is the payoff player i would have received had they played 
pure strategy s, and given their opponents actions s_i(t). For a — the variable qi tS (t) is thus the total payoff 
player i would have received given their opponent's play had i always played action s. A non-zero value of 
a accounts for exponential discounting over time, or equivalently for a possible memory loss. For a > the 
outcomes of the game in the distant past have a lesser effect on the valuation qi tS (t) than the more recent 
rounds of the game. 

The process defined by Eq. @ is inherently stochastic, given that all players choose t heir pure actions according 
to the probabil i stic rules of Eq. (fTJ). A deterministic limit has been considered in [Sato fc Crutchfield 20031 
ISato et al 20021 ISato et al 2005j | , and can be formulated as 

9v>(t+1) = (1 - a)g itS (T) +y^tt(s,s-i)x_t, s _ < (r) i (3) 

S— t 

where x_j jS _ i stands for the probability of action s_j G {1, . . . , 5'} p_1 being played by i's opponents, i.e. we 
have x^.s^ = Yij^i x j,sj ■ Taking into account Eqs. ([lj one can then write the update rule solely in terms of 
x and y and finds the following map jSato fc Crutchfield 2003j 

To interpolate between the stochastic process defined by Eqs. ([1121) and the deterministic limit of Eq. © we 
will consider a batch learning process, in which players update their score valuations only once every N rounds 
of the game, and keep them constant inbetween. Specifically, we will assume 

1 t+N-l 

qi,.(t + N) = {1 - a)q hs (t) + - ^ u(«,s_i(0), (5) 

t'—t 

and <7j jS (£ + £) = qi, s {t) for all ^ = 1,2, . . . , N — 1 . We will refer to N as the batch size of the learning process. 
The batch process at N > 1 (but finite) is here mostly a theoretical vehicle which allows one to understand 
the dynamics of learning. Real- world adaptation presumably operates close to the limit N = 1, nevertheless 
some of the existing work has focused on deterministic learning (N — > oo). Our work tries to address the gap 
between these two extreme cases, and to establish in a systematic manner the stochastic effects affecting the 
dynamics at finite batch sizes. The case N — 1 can be understood as a special limiting case. Previous work 
has shown that a pproaches t aken based on a systematic expansion in 1/ \N can give good results even for 
small batch sizes [Galla 2009j | . 



C. Sato-Crutchfield dynamics in continuous time 

In order to make contact with deterministic descriptions of evolutionary systems it is helpful to consider the 
continuous-time limit of the deterministic learnin g process, Eq. (HI). Assuming the validity of such a limit for 
small intensity of selection, f3 <C 1, and following [Sato fc Crutchfield 20031 ISato et al 2005l | one finds 

i i>s = (3xi^ s ^y^u(s,s_i)x_j, s _ i - x ijS 'u(s',s_i)x_ij - ax itS ^loga;^ - Y^ay logXj |S '^ . (6) 

For a = this reduces to a set of multi-population replicator equations, a signature of the close connection 
between evolutionary processes and adaptive learning. 
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FIG. 1: (Colour on-line) Deterministic flow of the two-population replicator dynamics for the Hawk-Dove game (arrows). 
The noisy green line shows the trajectory of one single realisation of the learning dynamics at batch size N = 5, started 
at x(0) — y(0) = 0.1, and with parameters a = 0, /3 = 0.01. 



III. TWO-PLAYER HAWK-DOVE GAME 



A. Definition and replicator flow 



We will first consider the case of a symmetric 2x2 game, the so-called Hawk-Dove game (also referred to as 
the coexistence game or the anti-coordination game) defined by the payoff matrix 



A = 



(b-c)/2 b 
6/2 







(7) 



where we set 6=1 and c = 2. We will label the elements of the payoff matrix by a SjS <, where s and s' can 
each take one of two values, representing the pure strategies of this game. In the learning process two players 
interact repeatedly, the strategy of each player is fully characterized by the probability of playing 'Hawk'. We 
will denote these probabilities by x(t) for the first player, and by y(t) for player 2. In the absence of memory 
loss, and taking the continuous-time limit of Eqs. Q we obtain the two-population replicator dynamics 



x(l — x) 



1 



y = 2/(1 - y) \ 2 - ! 



(8) 



These equations are obtained from setting /3 = 1, a = in the above Sato-Crutchfield equations ([5]) and upon 
using u(si,Sj) — a Si ,s- with the above payoff matrix ([7]). It is straightforward to work out the corresponding 
deterministic flow, we illustrate it for completeness in Fig. [TJ The replicator dynamics has one reactive fixed 
point at (x*,y*) = (1/2, 1/2), and two pure strategy fixed points at (0, 1) and (1,0). These fixed points at the 
boundary of strategy space are stable attractors, the central fixed point is a saddle with one stable and one 
unstable eigendirection. The stable eigenvector points along the diagonal, and restricting the dynamics to this 
direction (i.e. setting x = y) hence yields a stable flow towards the central fixed point. The single-population 
replicator equation 



x(l - as) (1/2 - x) 



(9) 



therefore converges to x = 1/2, provided non-extremal initial conditions (x(0) ^ 0,x(0) 7^ 1) are chosen. 
The two-population system will generally fixate at one of the corner attractors for generic initial conditions 
• x (0) 7^ 2/(0), only in the restricted case x(0) — y(0) is the symmetry between the players preserved and 
dynamics converges to the mixed fixed point. 
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FIG. 2: (Colour on-line) Fixation in stochastic learning of the Hawk-Dove game (a = 0). Left panel shows individual 
trajectories for different batch sizes N, started at x(0) = y(0) = 0.1. Each pair of curves shows x(t) and y(t) for a single 
run. The black dashed line indicates the evolution of the deterministic replicator equations. Parameters are a = 
and j3 = 0.01. The right panel shows the mean extinction time as a function of the batch size (simulations started 
from x(0) = 2/(0) = 1/2, extinction as defined in the main text). Symbols are from simulations (initial conditions 
x(0) — j/(0) = 0.1, averaged over 100 independent runs, a = 0, /3 = 0.01). The dashed line is a fit to a logarithmic 
dependence T — c\ + C2 In N. 



B. Fixation in stochastic learning 

We will first address learning dynamics in absence of memory- loss (a = 0), the effects of exponential discounting 
are described in Sec. 1111 Dl Numerical simulations show that stochastic learning without memory-loss will 
generally fixate in one of the two corners, (1,0) or (0, 1) of strategy space, a typical trajectory generated by 
the learning dynamics at finite batch size is shown in Fig. [1] This is further illustrated in the left panel of 
Fig. [51 where we show the evolution of x(t) and y(t) in stochastic learning at different batch sizes N. The 
dynamics are here started from x(0) = y (0) , and will initially follow the replicator flow closely, and approach, 
but not reach the replicator fixed point at (1/2, 1/2). Fluctuations, which will invariably occur at any finite 
batch size, break the symmetry between x and y however, and the system will generally drift off the diagonal 
x = y relatively quickly. While the stable eigenvalue of the central fixed point still exerts some attraction to 
the centre, the unstable direction will eventually take over, and draw the learning process to (1,0) or (0,1). 
Which one of these corners is reached is purely random, and determined by the nature of sampling errors in the 
adaptation process. Large batch sizes here reduce the amount of noise in the dynamics, and the system hence 
follows the deterministic flow longer at large N than at smaller batches, as illustrated in the left panel of Fig. 
[5J In the right panel of the figure we have measured the time-to-fixation T more systematically. Specifically 
we consider the system to be fixated once x(t) and y(t) have each approached the values or 1 up to an 
accuracy with $ > a small threshold (d = 10~ 5 in Fig. [2]). Once this condition is met each player plays 
essentially a pure strategy, i.e. the system is close to one of the corners of strategy space up to deviations 
smaller than We find logarithmic behaviour of the so-defined fixation time, i.e. T ~ In TV. This is consistent 
with observations in one-dimensional e volutionary co-ordination games, with one central unstable fixed point 
Nowak 20061 iTraulsen fc Hauert 200^ . 



It is generally very hard to compute the time-to-fixation of stochastic processes analytically, this applies both 
to learning processes and to evol utionary dynamics. In the latter, genera l analytical results have been obtained 
only for one-population models Nowak 20061 ITraulsen fc Hauert 2009l | . One major complication is here the 



fact that the dynamics in most other cases has at least two degrees of freedom, impeding a full analytical 
solution. 

Partial analytical results for game dynami cal learning ca n however be obtained for what we will refer to 
as 'escape times' in the following, see also [Mobilia 201Cj for studies of escape times in cyclic evolutionary 
games. For a given (finite) batch size N we here start the learning dynamics at the deterministic fixed point 
x(0) = 2/(0) = 1/2, and run the stochastic dynamics until the system reaches a given distance R from this fixed 
point. More precisely we define the escape time T(R) as the time at which the variable \x(t) — y(t)\/y/2 first 
exceeds the value R. This measure of distance was chosen for analytical convenience, as it will become clear 
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FIG. 3: (Colour on-line) Two-player learning dynamics in the Hawk-Dove game. The figure shows the escape time from 
a ball about the central fixed point (see text for details). Symbols are results from numerical simulations (/} = 0.01, 
averaged over 100 samples), solid lines show the theoretical estimates of Eq. (]13[l. dashed lines the approximation 
of Eq. (|54l) . The escape time scales logarithmically in the system size N, in-line with the existence of an unstable 
eigendirection of the limiting deterministic dynamics. 



below. Results from simulations are shown in Fig. [21 Analytical predictions of the escape time for small values 
of R are po s sible within a linear approximation about the central fixed point. Using the methods detailed in 
Galla 20091 iGalla 20101 ] we find that in the limit /3 <§C 1 and for large but finite batch size N the two-player 
learning dynamics can be described by the following Langevin dynamics 

X = J U X + Jl2V + i 

y = J 2 ix + J22V + V, (10) 

where the matrix J is the Jacobian of the continuous-time learning dynamics (equivalent to the replicator 
equations for the case a = we are considering here), specifically we have J\\ = J22 = 0, J12 = J21 = — /?/4 at 
vanishing memory loss 1 . We have here introduced x(t) =x(t) — 1/2 and y(t) — y(t) — 1/2, and address only the 
stationary regime in which deterministic learning has assumed its fixed point. The variables x and y describe 
fluctuations about this fixed point, £ and rj represent Gaussian white noise, with variances and correlations 
given by 

mm) = (v(t)v(t)) = mv(t)) = o. (n) 

Given that (1,-1) is an eigenvector of the above Jacobian (with an eigenvalue of /3/4) we then have 

d=^d + (, (12) 

where d(t) = (x(t) - y(t))/V2, and where <C(*)C(*')> = & 2 N~ l 5(t - t'), with a 2 = /3 2 /64. Using results for 
escape times of general Langevin processes of the form x = Xx + rj with (r](t)r](t')) = (a 2 /N)S(t — if) (see 



1 For general a one has Jn = J22 = 



—a. 
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appendix) we then obtain the following prediction for the escape time 

fWN -w / 2 \ 

T(R) = 2 dw —— sinh [w , (13) 

Jq * w \ W N J 

with wn — NXR 2 /a 2 . In our specific example we have A = /3/4 and a 2 = /3 2 /64. As seen in Fig. [3] this 
compares well with numerical simulations. 



C. Comparison with evolutionary dynamics in finite populations 

We have already indicated that the behaviour of the stochastic learning dynamics is, to an extent, similar 
to evolutionary processes. To quantify this further we investigate both one-population and two-population 
stochastic evolutionary processes in this section. 



1. Two-population dynamics 

Specifically we consider two populations, each composed of N players. Each of these players will either be a 
Hawk or a Dove, we denote the number of Hawks in the first population by i, and the number of Hawks in 
the second population by j respectively. The corresponding numbers of Doves are then N — i and N — j in 
the two populations. Players in the first population only play against players of the second population, and 
vice versa. The fitness of a Hawk and Dove players in the first population are then for example given by 

h,H = l-~, h,D = 2~~, (14) 

and similar definitions hold for individuals in the second population. In order to specify a microscopic dynam- 
ics we use the so-called ' local update rule', sometimes also referred to as the 'pairwise comparison process' 
[Traulsen &; Hauert 2009l |. A player of the 'Dove' type is converted into 'Hawk' player with a rate proportional 
to t + f m ,H~ fm,Di where m = 1, 2 labels the two populations. Similarly conversions of Hawk players into Dove 
players occur with a rate proportional to 1 + f m ,D — fm.H- Specifically, we will use the following transition 
rates 

T+ = l i N-i (3 j\ += 1 j N-j /3 i 
1 2N N \2 NJ' 2 2 N N \2 N 

I i N-i (I j\ 1 j N-j (1 i\ 

1 2 N N \2 N J ' 2 2N N \2 N J ' 1 ; 

The factors of the form (i(N — i))/N 2 here indicate that two players of different types need to be drawn from 
any one population in order for an interaction to occur. It is here important to stress that reproduction and 
selection occurs within the separate populations, i.e. at no point is an individual of one population converted 
into a member of the other. Interaction between the population occurs via Eq. (|14[) . i.e. the fitness of members 
of population one depends on the composition of population two and vice versa. 
In the deteministic limit (N — > oo) one recovers the two-population replicator dynamics 

x = T+>°° - Tp°° = ±x(l - x){l - 2y), y = T+' 00 - T^' 00 = \y{l - y)(l - 2x), (16) 

where we have used the replacements jj — > x and 4? — > y in Eqs. (|T5|) to obtain the T^'°°. The deterministic 
flow of these replicator equations is the one indicated in Fig. [TJ in particular the central fixed point has one 
stable and one unstable eigendirection. Fixation of the stochastic evolutionary dynamics can occur at any of 
the four corners of strategy space. We show results for the average time-to-fixation in the inset of Fig. [U 
the fixation time depends logarithmically on the system size N. As in the learning dynamics, an analytical 
calculation of the fixation time is very difficult. Estimates for the escape times can however be obtained within 
a system-size expansion about the fixed point of the deterministic replicator equatio ns. Following stan dard 
methods based on the so-called 'van Kampen expansion' in the inverse system size [van Kampen 1992l | one 
finds 

x = ~gV + Z(t) 

y = ~x + ri(t) (17) 
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FIG. 4: (Colour on-line) Two-population evolutionary dynamics in the Hawk-Dove game: Main panel shows the escape 
time from a region about the central fixed point (see text for details). Symbols are results from numerical simulations 
(averaged over 1000 samples), solid lines show the theoretical estimates of Eq. (|13|) . dashed lines the approximation 
of Eq. (|54[) . The escape time scales logarithmically in the system size N, in-line with the existence of an unstable 
eigendirection of the limiting deterministic dynamics. The inset shows the fixation time as a function of the system 
size (average over 100 runs). 



where x(t) = -fa — | and y(t) — 4, — i. As before and rj(t) describe Gaussian noise, from the van Kampen 
expansion one finds (£(t)r](t')) — as well as 



This translates into a Langevin equation 



(18) 



d=-d + ( 
4 



(19) 



for the variable d(t) — (x(t) — y(i))/y/2, with (((t)((t')} — -AfS(t — t'). A theoretical prediction for the escape 
time can hence be found using the values A = a 2 = 1/4 in Eq. (H31) . Results are tested against simulations 
and confirmed successfully in Fig. 2J 



2. One-population dynamics 



In the one-population model one considers a single population of TV individuals, each of whom can either be 
a Hawk or a Dove player. The state of the system is hence characterized by a single integer, the number i of 
Hawks. Transition rates of the local process read 



T+ = 



1 i N 



2N N 



i 

N 



T~ = 



1 i N-i 
2N N 



i 

N 



(20) 



The analysis of this model is n ot new as such, the study of single-population dyna mics of 2 x 2 games is in 
fact standard, see for example [Traulsen fc Hauert 20091 lAltrock fc Traulsen 2009} . We here present results 
mainly for completeness and in order to contrast them with the above two-population case. 
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FIG. 5: (Colour on-line) One-population evolutionary dynamics in the Hawk-Dove game: main panel shows the escape 
time from the central fixed point (defined as the point in time at which the quantity |5?(t)|, with x(t) as defined in 
the text, first exceeds the value R). Symbols are results from numerical simulations (averaged over 20 — 50 samples), 
solid lines show the theoretical results of Eq. (|I3|) . the dashed lines the asymptotic approximation of Eq. (|56|l . The 
escape time scales exponentially in the system size N, in-line with the existence of a stable eigendirection of the limiting 
deterministic dynamics. The inset shows the fixation time as a function of the system size, symbols are from simulations 
(average over 100 runs), t he solid line from an analytical calculation based on the methods and expressions detailed in 
[Traulsen fc Hauert 2009 ]. 



In the deteministic limit (N —> oo) the following replicator equation is obtained from Eq. ([2T)[) : 

x = T + >°° - T-<°° = ^x(l - x)(l - 2s). (21) 

This corresponds to restricting the two-population replicator equations (|16p to the subspace in which x(t) = 
y(t). In order to explore stochastic corrections to this limiting behaviour in next-to-leading order we again 
carry out the system-size expansion. As before we do not report the detailed mathematics, which is tedious, 
but standard. Defining x(t) = i/N — 1/2 one finds 

m = -\m+v(t), (22) 

where (r](t)i](t')) = j^5(t — t'). Setting A = —1/4 and a 2 = 1/4 in Eq. (fT5)l we obtain semi-analytical 
predictions for the escape time. These results are compared with simulations in Fig. [5j As seen in the figure the 
escape time no longer scales logarithmically in the system size as it was the case in the two-population model, 
but instead the escape is now exponentially slow in the asymptotic limit of large N. We have also measured 
the actual fixation time (see inset of Fig. [5]). Fixation times scale exponentiall y in the system size. Ana lytical 
results can here be obtained based on the methods described for example in [Traulsen fc Hauert 20 09]). For 
completeness we show the results of these calculations in the inset of Fig. [5] 



D. Effects of memory-loss in two-player learning 

Unlike in evolutionary dynamics, where fixation can occur in absence of mutation purely by random drift, 
fixation in stochastic game learning is strictly tied to the convergence of the limiting deterministic learning 
process to pure strategy equilibria. In order to demonstrate this we will extend the analysis to non-zero 
memory-loss rates a > in the following. Deterministic learning of the Hawk-Dove game in discrete time 
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oc=0 a=0.02 a=0.03 




t t t 

FIG. 6: (Colour on-line) Effects of memory-loss on deterministic and stochastic learning in the Hawk-Dove game. The 
upper row shows the outcome of the deterministic dynamics at /3 — 0.1 for different values of the memory-loss parameter 
a. In each panel we show five trajectories (x(t),y(t)) obtained from five independent random initial conditions. Lower 
row: single runs (x(t),y(t)) of stochastic learning, started from random initial condition. The critical value a c separating 
the regime of a stable central fixed point (a > a c ) from an unstable regime (a < a a ) is given by a c = /3/4 = 0.025. 

is then described by the two-dimensional map given by Eq. (jl]) (with the appropriate substitutions for the 
payoff structure). The point (x*,y*) = (1/2, 1/2) is a fixed point and the relevant eigenvalues are identified 
as A = (1 — a) ± (3/4. Assuming < a < 1 (and a < 2 — /3/4) we therefore find that the central fixed point 
is stable whenever a > a c — /3/4. In order to characterise the outcome of learning we have to distinguish 
between three different types of behaviour 2 : 

(1) For a = the central fixed point is not a stable attractor of the deterministic learning process. In this 
regime (1, 1) is still a stable eigendirection, so deterministic learning will converge to (1/2, 1/2) provided 
it is started from symmetric initial conditions (x(0) = y(0)). For generic initial conditions this symmetry 
is broken however, and the dynamics is observed to approach either (1, 0) or (0, 1) asymptotically. Noise 
in learning has a similar symmetry-breaking effect, and will drive the dynamics to one of the pure strategy 
attractors. 

(2) For a > 0, but a < a c the central fixed point is again not a stable attractor of the learning dynamics, and 
deterministic learning will converge to (1/2, 1/2) only if started from symmetric initial conditions. For 
asymmetric initial conditions the dynamics will approach an asymmetric fixed point (x* ^ y*), which 
is generally not a pure strategy for a > 0. With noise learning fluctuates around this symmetry-broken 
attractor. Memory-loss in learning thus acts similar to mutation in evolutionary dynamics, and impedes 
absorption at the boundaries. 



2 In the context of this paper these are mostly an empirical observations in simulations and from numerical iteration of the 
deterministic learning process. 
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(3) For a > a c deterministic learning converges to (1/2, 1/2) even for non-symmetric initial conditions. In 
this case there is no fixation, the dynamics of stochastic learning will fluctuate around the mixed-strategy 
equilibrium asymptotically. 

This behaviour is illustrated further in Fig. [6l 

IV. ESCAPE RATES IN CYCLIC GAMES 

We now consider a two-player discrete-time learning dynamics in the rock-paper-sciss ors game (RPS ). Detailed 
analyses of evolutionary processes in this cyclic game can for example be found in [Mobilia 2010j . We here 
focus on learning, and first concentrate on the deterministic limit. Specifically, using the deterministic limit 
of Eqs. Q we have the following map 

** (t+1) ~ W*.'C*>' Vs[t + l) ~ Ws.'« (23) 

where 

R s (t ) = J] a ss , y s , (t ) , S s (t ) = <W x a . (t ) , (24) 

s' s' 

and where A — (aw) (s, s' € {i?, P, 5}) the standard RPS payoff matrix, i.e. 

/ -1 1 \ 

A = 1 -1 . (25) 




Due to overall normalisation, £ s x s = £ s y s = 1, the above map defines a 4-dimensional dynamical system. 
The mixed strategy point x s = y s = 1/3 for all s is always a fixed point, and the corresponding 4x4 Jacobian 
is easily computed. One finds the following eigenvalues 

A = (1 - a) ± A», (26) 

each with degeneracy 2. Thus, the central fixed point is stable if and only if (1 — a) 2 + 4- < 1. For a fixed 
choice of /3 one therefore has stability for a > a c = 1 — yl — /? 2 /3, and an unstable fixed point otherwise. 
This separation of two regimes, one with a stable fixed point, and the other with a deterministic flow away from 
the centre of strategy space, is reflected in the escape times of stochastic learning. Results are shown in Fig. 
[7J In our simulations the stochastic learning dynamics is started at the fixed-point x = (1/3, 1/3, 1/3), y = 
(1/3, 1/3, 1/3) at the centre of the strategy simplex and evolved at finite batch size N. The system does not 
fixate into one pure strategy, so the escape time is measured as the point in time when the 6-dimensional vector 
(x, y) first crosses a circle of radius R — 0.1 around the fixed point. As seen in the figure the escape time 
scales sublinearly with the batch size if the fixed point is unstable (a < a c ). For neutrally stable deterministic 
dynamics algebraic scaling is found, and escape is sub-extensively slow in the regime of a stable fixed point 
(a > a c ). 

V. NETWORK GAMES 

A. Definition of the game 

We will now move to a m ore complex multi-p layer game defined on a networked structure, and consider the 
so-called 'best s hot game' iGaleotti et al 20101. Analyses o f the statistics of Nash equilibria on random graphs 



can be found in |Dall'Asta et al 2 009. Dal l'Asta et al 2010| . We here again focus on adaptive learning. Players 



are labelled by i = 1, . . . , M and arranged on an undirected graph, so that players i and j interact if and only 
if the link between i and j is present in the graph. In the 'best shot' game each player has the choice between 
two actions, to 'contribute' or not to contribute. For simplicity we will refer to these actions as 1 and 
respectively. The payoff any given player i receives in any round of the game then depends on her action and 
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FIG. 7: (Colour on-line) Mean escape times in the learning of rock-papers-scissors at a fixed value of /? = 0.1 as a 
function of the batch size. Symbols are from numerical simulations, averaged over 100 independent runs. The mean 
escape time is sub-linear in N if the fixed point is unstable (a < a c ), and super-linear for a > a c (stable fixed point). If 
the central fixed point is neutrally stable (a = a c ) the escape time scales linearly in the batch size N. The dashed line 
is a fit to a power law of the data at a = a c and reveals a linear scaing T ~ jV - 98 . For the present choice of P = 0.1 
one has a c w 1.668 ■ 10~ 3 . 

on the actions of her neighbours on the underlying network. If we write di for the set of neighbours of player 
i then the best-shot game is defined by the following payoff structure for action Sj = 1 

{a if Sj = Vj G di 
(27) 
if 3j e di : Sj = 1 

and by payoffs for action s, = 

( if s 3 : = Vj G 0i 
u(«, =0,s ai ) = { . (28) 

[ 6 if 3j edi-.s^l 

The variables a and & are positive constants. To a certain extent the game resembles the typical structure 
of public goods games. In absence of any contributors in player i's neighbourhood Q^jgaj s j = 0) player 
i will increase his payoff by contributing. If however at least one of her neighbours is contributing already 
(Sjedi s j > 0), then player i will not want to contribute herself. 




B. Sato-Crutchfield equations and homogeneous fixed point 

We will write Xi lS (t) with i = 1, . . . , M and s € {0, 1} for the probability with which player i takes action 
s at time t. One always has Xifi(t) = 1 — Xi t i(t). In the continuous-time limit one obtains the following 
deterministic equation 



X% s 



jet 



jedi 



a 



^ aft,*' In 



(29) 



s'eo.i 
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Taking into account that 



^ w(0, sai) £j, Sj . = fc 



1 - n 



(30) 
(31) 



we can rewrite the equations above in terms of the parameters Xi — Xi\: 



-=P(l- Xl ) 

Xi 



(a + b) l[(l- Xj )-b- 



■In- 



(32) 



where we have introduced r = %. Up to now all derivations hold for any network structure. In order to keep 
the analytical expressions to a sensible level we will from now on restrict the analysis to regular graphs, i.e. 
to graphs in which all players have the same number of neighbours. We will denote the degree of the resulting 
regular network by K. Looking for homogeneous fixed-point solutions of the above continuous-time dynamics, 
i.e. setting Xi{t) = x for all players i, one finds 



- = 0(1 - x) 

x 



(a + b)(l-x) K -6 + rln 



1 - x 



Excluding trivial fixed points, i.e. assuming x ^ and x ^ 1, one obtains 

(o + 6)(l ~x) K +rlni— ^ = 6 



(33) 



(34) 



The solutions to this equation give the possible fixed points of the deterministic learning dynamics. In the 
cases studied here we will typically have one internal fixed point, x* , its numerical value will generally depend 
on the model parameters r, a and b. For a = b and r = we recover the homogeneous mixed Nash equilibrium 
Xq = 1 — 1/2 X / K . Irrespective of a and b one finds x* 1/2 for r ^> 1. 



C. Stability analysis 

Expanding Eq. (|32j) around the fixed point X — X | X ^ to linear order one finds 




where p = (a + b)x*(l - x*) K . Eq. {35]) can then be written in matrix form as 

x = -j3 (pA + rl) x, 



(35) 



(36) 



where A is the adjacency matrix of the graph and I the M x M identity matrix. Diagonalizing this equation 
is equivalent to diagonalizing A. In particular the critical value of r, separating the phase r > r c in which the 
fixed point x* is stable from a phase with an unstable fixed point (r < r c ) is given by 



-MP, 



(37) 



where p is the smallest eigenvalue of the adjacency matrix A (all eigenvalues are real since A is symmetric). 
In analogy with the earlier sections we expect that the escape time of learning will scale logarithmically in 
the batch size for r < r c when the interior fixed point is unstable. In the stable phase (r > r c ) on the other 
hand one would predict an exponential behaviour. We verify these predictions in the following section. As a 
final remark regarding stability it is interesting to consider the limiting case of deterministic learning started 
from homogeneous initial conditions Xi(t = 0) = x(0) for all i. For regular networks of degree K one then has 
Xi(t) — x(t) for all i, and x(t) fulfills 
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FIG. 8: (Colour on-line) Escape times of learning in the networked best-shot game (a = 7, b = 1, ft = 0.01). Simulations 
are performed on a fixed regular graph with M — 10 players and K — 3, created at random. All runs are started at the 
homogeneous fixed point Xi = 1/2, the escape time is defined as the first time in each run when ^2]= 1 (xi(t) — 1/2) 2 = 
10" 3 . Results are averaged over 20 independent runs (all on the same realisation of the graph). The most negative 
eigenvalue of the adjacency matrix of this specimen graph is approximately —2.596, so that Eq. (|37|) predicts r c ~ 1.298. 
An exponent of approximately 1.3 is found in a power law fit to the escape time at r — r c (dashed line). 



x = j3x(l — x) 



(a + b)(l-x) K -6 + rln 



1-x 



Linearising about the fixed point, and restricting the motion to the space {xi 

x = —/3 (pK + r) x. 



j}, one finds 



(38) 



(39) 



Given that pK > the interior fixed point is therefore stable irrespective of the value of the parameter 
r > 0, similar to what was observed in the Hawk-Dove game. The network game considered in this section 
therefore bears close similarity to the Hawk-Dove game discussed earlier on. In a one-population setting 
(equivalently upon restricting the dynamics to the subspace Xi{t) = x(t)) the deterministic dynamics has a 
stable internal fixed point for any r > 0. In the multi-population case the fixed point remains unchanged, but 
unstable eigenvalues are present for r < r c . The corresponding eigendirections break the symmetry between 
the different co-ordinates {xi}, and hence the flow is away from the manifold defined by Xi(t) = x(t). 



D. Test against simulations 



The above theoretical predictions can be tested in several possible ways. For example one can consider the 
thermodynamic limit of large regular random networks of degree K ' and then perform an avera ge over multiple 
instances of the graph. Using results from spectral graph theory [McKay 198ll ICioa ba 2006] the support of 
the eigenvalue distribution of the adjacency matrix of a large regular random graph of degree K typically has 
its most negative eigenvalue at 



-2y/K - 1. 



(40) 



With this estimate the expected value of r c can then be computed by means of Eq. (1371) . Simulations of the 
learning dynamics on large networks are however time consuming, and we have therefore taken a different 
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FIG. 9: (Colour on-line) Fixation in learning in the networked best-shot game (a = 7, b = 1). Simulations are performed 
on a fixed regular graph with M — 10 players and K = 3, created at random. The black dashed line indicates the 
evolution of the deterministic replicator dynamics (RD), started with homogeneous initial conditions (hom IC) %i = 0.7 
for all players i. Symbols denote the evolution of the corresponding stochastic learning dynamics (SD) with batch 
size N = 10; for clarity, only three players are followed but all of them fixate. The colour lines show the evolution, 
for the same three players, of the deterministic replicator equations when the homogeneous initial conditions above 
are slightly perturbed in the direction of the configuration to which the stochastic learning fixated (het IC). That is 
to say, Xi = 0.7 + 5, where Si is a random number smaller than 10 -3 which is positive if the configuration reached 
by the stochastic learning has Xi = 1 and it is negative otherwise. If the perturbation is in any other direction, the 
configuration to which the replicator equations fixate may be different from the one reached by stochastic learning. 



route. We have created one particular instance of a regular random graph with AI — 10 nodes and degree 
K = 3. The adjacency matrix of this particular graph has then been diagonalised and the relevant eigenvalue 
has been identified as /i ~ —2.596. For convenience we have also chosen a — 7, b — 1, ensuring the value 
x* = 1/2 for the deterministic fixed point 3 . Eq. (|3"T|) then predicts a change of stability at r c ss 1.298. 
Measurements of the escape time of the best-shot game on this fixed sample of the graph are shown in Fig. 
IH1 Results are consistent with an algebraic dependence of the escape time T e on the batch size at r = r c , 
even though we note a slight discrepancy from the exponent of unity one would expect from the Langevin 
approximation 4 . Below r c the fixed point is unstable, and escape times are shorter, consistent with logarithmic 
scaling in the batch size. At r > r c the fixed point is stable and escape is slow. 

Fig. IH1 illustrates the dynamical evolution under either the deterministic replicator equations or the stochastic 
learning dynamics (r = 0). It shows how the system is driven to fixation by a non- homogeneous perturbation, 
caused either by a slight heterogeneity in the initial conditions of the replicator equations or by the stochasticity 
induced by a finite batch size in the learning dynamics. Specifically one finds that (i) the fixed point X{ — x* is 
an attractor for homogeneous initial conditions 5 and that (ii) inhomogeneous initial conditions lead to a flow 
towards the corners of strategy space. 



3 This choice was made purely for analytical convenience, to ensure that the deterministic fixed point carries no dependence on 
r. If one were to use the best-shot game to model an actual public goods game one would probably choose b > a. 

4 We attribute these differences to the approximation implied by the continuous-time limit, and to potential effects of higher-order 
terms in the expansion in powers of N~ 

5 We expect that this no longer holds on graphs which are not regular, i.e. in which different nodes have different degrees. Such 
a heterogeneity can be expected to break the symmetry between the nodes. 
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VI. CONCLUSIONS 



In summary we have studied the fixation properties of simple reinforcement-type learning algorithms in the 
context of different games, and have compared them to the outcome of evolutionary dynamics. The examples we 
have chosen range from simple two-player two-action games to more complex multi-player games on networked 
structures. Our main results can be summarized as follows: (i) Unlike evolutionary dynamics in which fixation 
can occur purely driven by fluctuations, fixation (i.e. convergence to pure strategies) in learning of games of the 
type we have studied here appears to be possible only if the underlying deterministic dynamics itself converges 
to a pure action profile. This is typically only the case if the symmetry between players is broken, for example 
by an inhomogeneous initial condition; (ii) Two-player and multi-player learning in the deterministic limit 
can, to a good approximation, be described by equations of a multi-population replicator type. As seen for 
the Hawk-Dove game, and for the network game we have studied, the stability properties of multi-population 
replicator dynamics can differ substantially from those of the corresponding one-population model; (iii) The 
role of noise in fixation processes in dynamical learning is mostly limited to triggering the required breaking 
of symmetry, eventually leading to fixation, unlike in evolutionary processes we have not found examples in 
which fixation is triggered by random drift alone, (iv) In cases for which the limiting deterministic learning 
converges to a symmetric fixed point in the interior of strategy space the corresponding escape time depends on 
the stability of this fixed point: for stable fixed points escape is essentially exponential, for unstable fixed points 
logarithmic scaling in the batch size is found, these findings are very similar to those found in evolutionary 
systems. 

While we have pointed out crucial differences between multi-player learning and evolutionary dyna mics, our re- 
sults mostly extend the similarities between the two approaches to dynamical aspects of games. In [Galla 2009j 
it was pointed out that stochastic learning can exhibit persistent quasi-cycles in regimes where deterministic 
learning converges to fixed points. These effects are very similar to those observed in evolutionary systems. 
The present work shows that the analogy goes further, and that the escape and fixation properties of stochastic 
learning dynamics are closely related to the corresponding behaviour of population models. Our analysis shows 
that the analogy is particularly strong when learning is compared with evolution in multiple populations. We 
expect that these similarities stretch even further, including potentially pattern formation in spatially extended 
systems and or more complicated dynamics on adaptive networks. Futur e work may also include more complex 
learning models [Camerer fc Ho 1999l [C amerer 20031 . IChong et al 2007| inspired by laboratory experiments in 
behavioural game theory or by algorithms in machine learning. We are here confident that the analogy with 
stochastic evolutionary systems may provide a powerful perspective and that it can contribute to accelerating 
the research required to analyze more general learning dynamics. 
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Appendix: Escape time for linear Langevin dynamics 
A. Reduced problem and backward Fokker-Planck equation 

Let us consider the simple ID problem 

x = \x + V2Drj 

where j](t) is standard Gaussian white noise, i.e. (f](t)) — and 

(rj(t) V (t')) = 6(t-t'). 

We consider escape times: Fix a number R > 0, if the process is started at t = at x(Q) 
time is defined as the first time, the process leaves the interval [—R,R]. 



(41) 
(42) 

= 0, then the escape 
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Using standard methods [Gardiner 20091 iRisken 1996j one finds the backward Fokker-Planck equation 

Xxg'(x) +Dg"(x) = -1, (43) 

where g(x) is the mean escape time, conditioned on a starting point x. Boundary conditions read g(R) = 
g(—R) = (if the process is started at the boundaries of the interval, the extinction time is trivially zero). It 
is easy to check that the solution is given by 

- I" dye-^ 2 /^ f dze Xz2 i( 2D \ (44) 



9W D 

Restricting to a starting point of x — one has 



o 



ff ( ) = I ( R dy e "V/(a2>) I" dz e ^ 2 /(2D) (45) 



DJo 



By introducing the change of variables z = R\J (1 — u)v and y = Ry/v, introducing the appropriate Jacobian 
and setting wn — XR 2 /2D, we can rewrite the integrals on the RHS of Eq. (|4"5j) as 



where 



v!"i = ! y L 2-fi( 2 1 _'J;-wi ! -n . i 



^Ui'Hjf* jf (47) 



is a generalised hypergeometric function whose asymptotic behaviour is well known [DLMF 2010] . For sim- 



plicity, we will in the following give a heuristic derivation of the asymptotic leading behaviour of <?(0) at large 
values of N. Our starting point will be the expression given in Eq. (|45l) . 



B. Large- N behaviour for A > 

Setting 2D = a 2 /N, so that wj\r — N\R 2 /<r 2 , and changing variables in (|4"5j) to u = z + y and v — z — y one 
has 



, 9 (0) = 4 / dv / due NXuv/a . (48) 

l-R J-v 



a 2 



Executing the integration over u and setting w — NXRv / a one finds 



g(0) = l d»^^-y 4 d.^^, (49) 

where we have introduced the lower integration limit <5. This variable will be set to zero eventually, the purpose 
of our procedure is to keep track off singularities at small values of w. 

When A > and N is very large the main contribution to the integral in (|4"9"]l comes from small values of w due 
to the decaying exponential factors. Notice that the term w 2 /wn in the second integral in (|4l?f is only relevant 
when w ~ wn in which case the exponent — 2w + w 2 /wn — wn and so the integrand is exponentially small 
in wn- Therefore, we can neglect the term w 2 /u>n in the exponent relative to w. Doing this and introducing 
the variable Q = w 2 /wn in the first integral and Q' — 2w in the second, we obtain 

m '-T,L„„ As> --jL iSt — ■ (50) 

Both integrals on the RHS are of the type J x dfl ^jj-, where e = S 2 /w n in the first case, and e — 28 in the 
second. In both cases e tends to zero as 8 — > 0. The upper integration limit x is proportional to wn in both 
integrals. Integrals of this type can be simplied by an integration by parts, and we find 

/x —ii poo —si roo — q i-oo roo — n 

dtt — = dn -£f- dtt — = e- n \ogfl\™ + dfie- f2 logfi-/ dfi— . (51) 
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The last integral in (|51[) is exponentially small in a; a wjy, as one can see from 

/■DO -O /-OO 

/ dn —< dn e~ u = e~ x , (52) 

valid for x > 1. We can therefore neglect the last term in Eq. (|5ip in the limit of N — > oo, when we have 
u>jv — )• ex). Performing the limit <5 — >• (and taking into account that e — > in this limit) we find 

J dCt — w-loge-7e, (53) 

where 7 e = — J °° e _sl log 17 = 0.57721 ... is the Euler-Mascheroni constant. Using these results in Eq. ([50)1 . 
we finally find 

3 (0) w i (log wat + 7e + log 4) (54) 

Except for the additional log 4 term, this result coincides with the asymptotic results for escape times reported 
in [Mobilia 2010] for the case of a two-dimensional system. 



C. Large- iV behaviour for A < 

For A < and N large, we have from Eq. ([45)1 

9(0 ) = 2 JL f R dy [ V dz e N ^ 2 -^ 2 « 2 4 l R dy e N ^ 2 ^ 2 \ H dz e -»W . (55) 
CT Jo Jo a Jo 2 J_ QO 

We have here first extended the integration range of the z-integration to the interval [—y, y], adjusted for by 
an overall factor of 1/2. Subsequently, based on the observation that the integrand assumes its maximum at 
z = we have extended the integration range of the z-integral to the entire real axis. This introduces an 
error which does not contribute to the leading exponential behaviour for large N. The integral over z is now 
Gaussian, and can be evaluated straightforwardly. The exponent in the remaining y-integration reaches its 
maximum at y = R. Expanding the exponent up to first order about this value we get 



1 / ira 2 2N _» m i n 2 ,2 f R , ,win n .,/j 1 



with \w N \ = N\X\R 2 /a 2 . 



